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Abstract 

We consider the occurrence of record-breaking events in random walks with asymmetric jump 
distributions. The statistics of records in symmetric random walks was previously analyzed by 
Majumdar and Ziff [l| and is well understood. Unlike the case of symmetric jump distributions, 
in the asymmetric case the statistics of records depends on the choice of the jump distribution. 
We compute the record rate P n (c), defined as the probability for the nth value to be larger than 
all previous values, for a Gaussian jump distribution with standard deviation a that is shifted by 
a constant drift c. For small drift, in the sense of c/a -C n -1 / 2 , the correction to P n {c) grows 
proportional to arctan(y / n) and saturates at the value For large n the record rate approaches 
a constant, which is approximately given by 1 — ( a / \/2n c) exp (— c 2 /2a 2 ) for c/a > 1. These 
asymptotic results carry over to other continuous jump distributions with finite variance. As an 
application, we compare our analytical results to the record statistics of 366 daily stock prices from 
the Standard & Poors 500 index. The biased random walk accounts quantitatively for the increase 
in the number of upper records due to the overall trend in the stock prices, and after detrending 
the number of upper records is in good agreement with the symmetric random walk. However 
the number of lower records in the detrended data is significantly reduced by a mechanism that 
remains to be identified. 

PACS numbers: 05.40.-a, 02.50.Ey, 89.65.Gh 
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I. INTRODUCTION 



The random walk is a paradigmatic model of statistical physics, which combines utmost 
conceptual simplicity with a surprising richness of emergent behaviors 

aa. 

Among the 

many interesting features of random walks, recent research has focused in particular on its 
extremal properties, exploring quantities such as the height and position of the globally 
maximal excursion of a one-dimensional walk of a given number of steps, and the statis- 
tics of records of this process [4|. Here a record is defined as an entry in a discrete, real 
valued time series that is larger (upper record) or smaller (lower record) than all previous 
entries. While the mathematical theory of records is well developed for time series of in- 
dependent, identically distributed random variables |s-7|, little has been known about the 
record statistics of correlated processes. It is therefore remarkable that records of a large 
class of one- dimensional random walks can be characterized in considerable detail, as was 
shown in recent work by Majumdar and Ziff (MZ) [1]. Specifically, they considered the 
random process defined by 

X n = X n _i + £ n , (1) 

where Xq = (say) and the step sizes £ n are independent, identically distributed ran- 
dom variables drawn from a probability density </>(£) that is required to be continuous and 
symmetric, but is otherwise arbitrary. We say that an upper record occurs at time n if 
X n = max{X , ...,Xn\. Based on the Sparre Andersen theorem for the survival probability 



of the random walk |8 



4ll|. MZ show that the probability II (m,n) for the nth event to be 



the mth record is given by 



n(m,n)= l 2n - m + 1 ) 2 - 2n+m - 1 (2) 



m 

for m < n + 1. The first moment of this distribution with respect to m yields the mean 
number of records after n steps, which equals m n « f° r large n, and the probability 

P n for the nth event to be a record (henceforth referred to as the record rate) decays like 
P n m -^L=. In the present paper we aim to generalize these results to random walks with 
asymmetric jump distributions. In the first part of the paper (Sections [Til and IHIj) we 
study records generated by random walks with a symmetric jump distribution that have an 
additional constant drift c, such that ([T]) generalizes to 

X n = A n _x + i n + C (3) 



with a symmetric jump distribution <!>(£) . For the special case of a Cauchy distribution this 



problem was considered previously in 



12]. Here, we derive approximate results for the case 
of a Gaussian jump distribution that apply also more generally to distributions with a finite 
variance. 

Similar to our earlier wor k fl3, [l^j on the related problem of records from independent 
random variables with drift [12J, |l5|, our strategy will be to analyze the limiting cases of 
small and large drift, respectively, as quantified by the ratio c/a of the drift speed to the 
standard deviation a of the jump distribution </>(£). For the Gaussian random walk we find 
that in the limit of - <C -4= the mean number of records and the record rate are given by 

, . c \/2 , , , . 

m n (c) pa — H [n arctan [y/nj — v 7 ^) > (4) 

wn an 



P n (c) pa + — arctan (v^) • (5) 

Jnn an 



In the limit of ^ ^> the record rate P n (c) approaches a constant value. If in addition 
| ^> 1, this constant is given approximately by 

c _ c 2 

lim P n « 1 2^. (6) 

n->oo J In a 



In Section [TV] we apply our results to fluctuations in stock prices, arguably one of the 



most important (and ancient) applications of random walk theory 16Ml8l] . The basic model 
of a stock price is the geometric random walk S n = e Xn with an upward bias reflecting 
long-term economic growth. Our analysis of record events in the Standard & Poors 500 
index shows a corresponding surplus of upper record events, which is consistent with the 
theoretical expectation. However, an asymmetry between upper and lower records remains 
even when the bias has been (approximately) removed Q, a feat ure that may be related to 
the gain-loss asymmetry reported in previous analyses of stock market fluctuations 2CH23]. 
We conclude with a summary and a discussion of some open problems. 

II. SURVIVAL PROBABILITIES AND FIRST PASSAGE TIMES 

The record statistics of a random walk can be analyzed by considering the generating 
functions of the survival and first passage probabilities of the process [lj, Q, Lx2 ] - In [ij it was 
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shown that the generating function of II (m, n) is of the form 

oo 

U(m,n)z n = fH l - 1 (z)^(z), (7) 

n=m— 1 

where q± (z) is the generating function of the positive (negative) survival probability q± (n) of 
the random walk. The latter is defined as the probability that the process stays above (below) 
the origin up to the nth step. Similarly f± (z) is the generating function of the positive 
(negative) first-passage probability f± (n) of the random walk, with f± (n) = q± (n — 1) — 
q± ( n )- bn the case of the symmetric random walk considered in |l| we have g_ (n) = q + (n) = 
q (n) and /_ (n) = /+ (n) = f (n) and both g (n) and / (n) are completely universal for all 
continuous jump distributions. 

Since we want to study asymmetric random walks, we need distinguish between positive 
and negative survival probabilities and first passage times, and consider the functions q± (n) 
and f+ (n) . As in pj a theorem by Sparre Andersen will play a key role in our considerations. 



In 



ll| it was shown that 



CO 



q ± (z) = U (n) z n = exp ^ ^z n ) , (8) 

n=0 \n=l 



where p± (n) is the probability for the walker to be above or below the origin at the nth 
step. This quantity can be easily computed from p± = J °° G (±x, n) dx, where G(x,n) is 
the positional probability density of a random walk of n steps that started at the origin. 
Details on the computation of G (±x, n) and p± (n) can be found in 4j and 10|. In the case 
of a symmetric random walk we simply have p± (n) = ^independent of n and we find that 
in this case q±{z) = (1 — z)~* and q± (n) = ( 2 ^) 2~ 2n These results eventually lead to 
Eq. © [J. 

In the case of an asymmetric random walk the situation gets a bit more complicated. 
We compute q± (n) and its generating function for a Gaussian random walk with drift 
c. Here, the jump distribution of the symmetric random variable £ in fl3]) is of the form 
<p (£) = ^/^ e 2^ with standard deviation a. It is easy to show that the probability density 
G(x,n) of the random walk after n steps is given by G(x,n) = ^^ exp y— ^^ 1 and 
P± ( n ) = 5 (l =t er f (-s/ff )) ' ^ e s t ar t with the case of a small linear drift with c ^ such 
that p± (n) ~ 2 i v/^f ■ Now we can employ Eq. (JS}. Expanding up to first order in c we 
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FIG. 1: Relative effect of the drift on the positive survival probability q + (n,c) of a Gaussian 
random walk with a = 1. The effect of the drift is represented by = (q+ (n,c) — g+ (n,0)) for 
different drift speeds c. We simulated 10 7 realization of a random walk with n = 100 steps for each 
drift speed. The dotted line represents the analytical results obtained in Eq. (Tl3j) . For small drift 
c = 0.001 and c = 0.01 we find good agreement with this approximation. 



find 



q±\ z ) 



1 ± 



E 



27TCT ' \/n 
n=l 



(9) 



With y/1 — z 1 = Xl^Lo { 2 >n) 2~ 2n ^ n and making use of the Cauchy formula for products of 
infinite sums we obtain the following expression for q± (z): 



1 c 
± 



EE 



2k 



2*a^^\kJ Vn-k + 1 



2~2k z n+l 



(10) 



The binomial coefficient can be approximated by ( 2 fc fc ) ~ A k /yfnk and with this we can 
approximate the sum over k by an integral, 



n 

E^ 



z n+l j «. dfc2 n+1 



k=0 



k y/n-k+1 Jo y/ky/n - k + 1 



(11) 



We thus obtain a simple result for the generating function of the survival probability, 

1 oo 

fcw^^E^ < 12 > 



71=1 



and finally the following expression for the survival probability q± (n) under a small linear 
drift: 

9±( " )ss v^ ± ^- (13) 

The first term on the right hand side is the result for the symmetric random walk discussed 
in [l| , which is now supplemented by a correction linear in - . Although this particular result 
for q± (n) will not be needed in our derivation of the record statistics, we found it useful to 
test it against numerical simulations. The results are shown in Fig. [TJ For small c, Eq. f)13p 
is in good agreement with the simulations. 

For the sake of completeness we also provide the small c expansion of f± (z) that will 
become important later. With f± (z) = 1 — (1 — z) q± (z) we find 

z n 



L(z)»l-JT=-z l ± ' . (14) 



n=l v / 

From this we obtain, by methods very similar to those used above to derive q± (n), the result 

/±(n) w-^=n-i±-^— n-3. (15) 

Next we consider the case of large drift, - 3> 1. Here we will only discuss g_ (n), as this 
is the quantity needed for the computation of the record rate; q + (n) has a different behavior 
in this regime. In the limit of - ^> 1 we find that p- (n) m o (2-Knc 2 ) . Using this 

we find that for large n, g_ (z) and g_ (n) are of the form 



-=e~^z\ (16) 
n=1 cVZim 6 



a 



q-(n) w — e ^ . (17) 



cv2vrn 3 



These particular results were already reported in [12j . At this point it is important to notice 
that all results concerning the first-passage and survival probabilities in the large n limit 
are easily transferable to other jump distributions as long as these have a finite variance. 
Because of the central limit theorem, G (±x, n) and therefore p± (n) will approach the same 
expressions for large n as were derived here for the Gaussian jump distribution. 
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III. GAUSSIAN RANDOM WALKS WITH DRIFT 



A. Record rate for small c/a and n <C (<r/c) 2 

With the small c expansions in Eqs. and (|14|) we have all ingredients needed to 
derive the record statistics for a Gaussian random walk with a small linear drift. We 
start by computing the mean number of records m n expected up to the nth step. For 



the generating function m(z) = Y^=o m nZ n of this quantity it was found in 12] that 



rh(z) = 1/ ((1 — z) 2 q_ (z)~j, a result that can be obtained by computing the first moment 
of Eq. (j7j). We can now evaluate this expression making use of the generating function for 
g_ (n) given in Eq.(JH]). In the limit of small - this yields 

1 / c z 



m(z) 



i + E^H- ( 18 ) 

. /oZ-^r £-~< . fin I y ' 



Using the series expansion of y/1 — z 3 and employing once again the Cauchy formula for 
infinite sums and the Stirling approximation, we find 



m\z) 



1 V2c^ Vk 



3 +—Yz n Y , v . (19) 

If n is not too small, the sum over k can be replaced by an integral and we finally obtain an 
approximate expression for the generating function of m n , 

1 V2c 



m (z) 



H E zn ( n arctan ( Vn) - Vn) • (20) 



The mean number of records of the random walk with a small linear drift c is therefore 
approximately given by 



m r 



f2n\2n + l ^/2c , , , — 

\ n ) ~ 1 2^~ + 77 ^ arCtan ^ U > ~ ^ " ^ > 



Making use of the Stirling approximation this yields the previously announced expression 
(jlj) for m n (c) and, by taking a derivative with respect to n, the record rate P n (c) in the 
large n limit as given in Eq.Q. The leading order correction of the record rate due to the 
drift is seen to increase with arctan (\/n) and for larger n it approaches a constant value. 
For large n (but still in the regime ^ <C we find the simple result 

P n (c)^-L + ^ r . (22) 
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FIG. 2: Relative effect of the drift on the record rate P n (c) of a random walk with a Gaussian 
jump distribution (a = 1). The effect is represented by - (P n (c) — P n (0)) for different drift speeds 
c. Again, we simulated 10 7 realizations of a random walk with n = 100 steps for each drift speed. 
The line represents the analytical results obtained in Eq. ([5]). For small drift speeds c = 0.001 and 
c = 0.01 we find good agreement with the approximation, but for c = 0.1 the approximation is no 
longer accurate. 

We compared Eq.(j5]) to simulations and found good agreement in the regime | -C (Fig. 
[2]). We also compared this result with numerical simulations of the record rate for random 
walks with step sizes drawn from a uniform distribution (Fig. [3]). The results for the 
Gaussian and the uniform distribution are very similar to each other already for small n, 
reflecting the convergence expected from the central limit theorem. 

B. Asymptotic record rate for large n 

Next we consider the limit of strong drift, - 3> 1. Applying the same method as above 
and making use of our result (fT6l) for g__ (z) in the regime of large c/a, we find that the 
number of records increases linearly with time according to 




(23) 
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FIG. 3: Relative effect of the drift on the record rate P n (c) for a random walk with a uniform 
jump distribution with standard deviation a = 1. The parameters of the simulation are the same 
as in Fig. [2j Even though the expression ([5]) was derived for a Gaussian jump distribution, it is in 
a good agreement with the numerical results for small c. 

Correspondingly the record rate P n is independent of n in this case. In fact, simulations 
show that the record rate approaches a finite, nonzero limit -P(c) = lim^oo P n (c) for n — > oo 
for any positive value of the drift (Fig. HJ). This can be understood, on the basis of the 
general relation (JTj) between the distribution of record events and the negative first passage 
probability, to be a consequence of the fact that the negative mean first passage time of 
a random walk with positive drift is finite 9), 10 1; roughly speaking, one expects that the 



asymptotic record rate -P(c) is proportional to the inverse of the negative mean first passage 
time. The result fl23l) implies that the asymptotic record rate behaves as 



P(c) 



1 - 



a 



2tcc 



-e 2<r 



(24) 



for large c/a (see inset of FigHl. Furthermore, since the negative mean first passage time 
diverges as c _1 for c — > Jj], [l0| . the asymptotic record rate should behave as P(c) ~ c for 
small c. This is confirmed by the simulations, which indicate that P{c) ~ 1.39 (c/a) for 
c/a < 1. 

The time scale n*(c) at which the saturation of the record rate occurs can be estimated 
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FIG. 4: Record rate for a biased Gaussian random walk with standard deviation a = 1. The figure 
illustrates the convergence of P n {c) to the asymptotically constant record rate P(c) for n — > oo. 
The inset shows that the large drift result (|24|) becomes accurate for c/er > 1, and the bold dotted 
line shows that P(c) 1-39^ for c — > 0. 
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FIG. 5: Illustration of the conjectured scaling collapse (|26p of the record rate P n {c) for Gaussian 
random walks with a = 1 and various drift speeds c < 0.1. 
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by comparing the two terms in Eq. fl22l) , which shows that 

»• ~ B 2 (»> 

for small c. Not surprisingly, this is also the time scale at which the drift begins to dominate 
the mean square displacement of the random walk. Together with the linear behavior of the 
asymptotic record rate, this suggests the scaling form 

P n (c) = -g((c/a) 2 n) (26) 
a 

for small c/a and arbitrary n, where the limiting behaviors of the scaling function are 
g(x — > 0) ~ and g(x — > oo) w 1.39. This relation is well fulfilled by the numerical data 
shown in Fig 13 



IV. RECORD STATISTICS OF STOCK PRICES IN THE S&P 500 



A prominent application of the random walk process can be found in the financial sciences. 
Originally introduced by Bachelier in 1900 16| , the geometric random walk is the standard 



model used to describe the evolution of stock prices. In the application of this model to 
actual data, trends are always an issue, which in the simplest case are described by a linear 
drift in the logarithm of the stock price. In this section we present an empirical analysis 
of record events in historical stock prices taken from the Standard & Poors 500 index, and 
compare the results to the theoretical predictions derived above. The observational data we 
used consist of daily recordings of 366 stocks that were contained in the index from January 
1990 to March 2009, resulting in 366 time series of length n = 5000 [24] . We first analyzed 
the recordings without any detrending and then considered detrended data in which a fitted 
linear trend was subtracted from the logarithms of the stock prices. 

In the raw stock data the number of upper records after n = 5000 trading days is 



considerably larger than the expected number of 2a/5000/7t ps 79.79 for a symmetric random 
walk. At the end of the observation period, we found an average number of 166.56 upper 
records in the stocks, but only 22.33 lower records. The rate of upper records was roughly 
constant over the entire period, whereas the rate of lower records was almost zero already 
after 300 days. Apparently a positive trend had a very strong effect on the record statistics 
of the analyzed stocks. To quantify the trend, we performed a linear regression analysis 
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FIG. 6: Mean number of records averaged over 366 stocks from the S&P 500 index, computed 
from daily values for the time period 1.1.1990 - 31.3.2009. Full thick line shows the number of 
upper records, dotted thick line shows the number of lower records in the data set. The expected 
number of records m n (0) = -^\fn for a symmetric random walk is shown by the thin dotted line. 
Also shown are the predictions of the biased random walk model with effective normalized drift 
c/a = 0.025 obtained from Monte Carlo simulations as well as from the approximate expression 
fnn{c) = m n (0) + -j^ti (thin full line). 



on the logarithms of the individual stock prices, determining the drift q and the standard 
deviation of increments <Tj for each stock % = 1, 366. The normalized drift q/ctj was then 
averaged over all stocks, yielding the estimate (cj/cTj) « 0.025. At n — 5000 we are thus well 
outside the regime in which the pertubative result (0J should be valid. Still, inserting the 
estimated normalized drift c/a = 0.025 into (jlj) we obtain a record number of 166.59, in very 
close agreement with the observed value. The comparison with Monte Carlo simulations of 
biased random walks with the same drift shows that this accuracy is actually fortuitous, but 
the description of the stock market data by the biased random walk model is nevertheless 
quite reasonable (Fig. |6]). 

Next we detrended the data by subtracting the fitted linear trend from the logarithmic 
stock prices, and counted the number of records in the detrended time series. We found an 
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FIG. 7: Mean number of records in subsequences of the time series taken from the S&P 500 index. 
The entire data set of 5000 consecutive daily values was split into 50 subsequences of length 100. 
For each of the subsequences a linear detrending of the logarithm of the daily values was performed 
and the upper and lower record numbers were determined from the detrended data. The results, 
averaged over all stocks and all subsequences, are given by the thick black line (upper records) 
and the thick dashed line (lower records). The thin dashed line shows the analytical prediction for 
a symmetric random walk m n (0) = lyfripK. The number of upper records is in good agreement 
with m n (0), but the number of lower records is significantly reduced. 



average number of 75.79 upper records after 5000 steps, in close agreement with the result 
for a symmetric random walk. However, the number of lower records was only 53.65, which 
is significantly smaller than expected. This residual asymmetry between upper and lower 
records persists if, instead of subtracting an overall linear trend, the data are detrended 
by normalizing each stock by the index 19]. To further explore this phenomenon we split 
the time series into 50 shorter series each lasting 100 trading days. We detrended each of 
the shorter time series individually by subtracting a linear trend, counted the number of 
upper and lower records, and then averaged the record numbers over the whole ensemble 
of 50 x 366 series of length 100. The results are shown in Fig. [7J It appears that while 
the number of upper records is in a very good agreement with the symmetric random walk 
model, the number of lower records is still suppressed. This effect was found for different 
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choices of the lengths of the time series and appears to be independent of this choice. 

Qualitatively, a reduced number of lower records indicates that the positive first-passage 
times are increased compared to the corresponding negative first passage times. An asym- 
metry between first passage times to a prescribed (positive or negative) return level has in 
fact been observed in previous analyses of stock market data, and is known as the gain-loss 
23| . However, this phenomenon differs in several important respects from 



asymmetry 



20 



the one reported here. First, in most (though not all [23|) cases the sign of the asymmetry 
is opposite to that suggested by the asymmetry in the record statistics, in that first passage 



;imes 
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'or crossing a prescribed level from below are larger than for crossings from above 
22|. Second, the asymmetry vanishes when the prescribed return level tends to zero, 
which is the relevant limit for the analysis of records. Finally, in contrast to the asymmetry 
between upper and lower records reported here, the gain-loss asymmetry is a property of 



entire stock indices which does not occur in individual stocks 



21 



22] . Indeed, a prelimi- 



nary analysis of first-passage times to the origin in the detrended S&P 500 data shows an 

asymmetry between positive and negative excursions only when the starting point of the 

I L 

excursion is conditioned to be a record event [25]. An explanation of the observed residual 
asymmetry between upper and lower records must therefore be left to future work. 



V. SUMMARY 



Hi 



In conclusion, using the methods introduced in [if and a more general form of the Sparre 
Andersen Theorem 4j, |8] , we were able to describe the effect of a linear drift on the record 
statistics of a Gaussian random walk in two regimes. For short times n <C (^) 2 we find 
that the correction to the record rate P n (c) — P n (0) increases proportional arctan (n) and 
then saturates at a value of On the other hand, for large n the record rate saturates 
at a constant limiting value P(c), which is linear in c for c/a< 1 and approaches unity for 
large c/cr according to Eq. fT2"4l . The transition between the two regimes is described by the 
scaling form (1261) . 

We applied our results to the statistics of records in 366 stocks contained in the S&P 500 
index from 1990 to 2009. We found that, after detrending, the number of upper records in 
the stocks is basically identical to that predicted for the symmetric random walk. The fact 
that the number of lower records appears to be systematically decreased is interesting and 
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needs to be examined more thouroughly in the future. On the theoretical side, a possible 
topic for future research is the record statistics of asymmetric random walks with a more 
complicated asymmetry than just a constant drift. The issue of asymmetric random walks 
with discrete jump distributions is also still open for further investigations. 
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